**Notes**:

* Closed Text and Closed Notes Exam
* Time: 3:30 pm – 4:45 pm
* There are **5**questions. Answer all questions.
* Maximum points: 50.
* Answer in clear and legible handwriting. Partial credit will be given.

|  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- |
| **I** | **II** | **III** | **IV** | **V** | **Total** |
|  |  |  |  |  |  |

1. (10 pts.) Short questions, Fill in the blanks, True or False.
2. What are power and memory walls in the context of computer architecture?
3. MIPS is an accurate measure of computer performance. True/False
4. Suppose a new processor is 10 times faster in computation than the current one. Assume that the current processor is 40% time busy with computation and 60% of the time idle waiting for IO. What is the overall speedup with the new processor?

1. Briefly state the two principles of locality.
2. A program consisting of 500 instructions is executed on a 5-stage processor. How many cycles would be required to complete the program. Assume ideal overlap in case of pipelining. What is the speedup with pipelining?

1. TLB stands for \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

RISC stands for\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

1. RISC-V architecture is what type? \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
2. Branches comprise 20% (BF) of all instructions. Branch prediction is 80% accurate (BPA). 2 cycle stalls (SP) on each mis-prediction. Compute average branch penalty control hazards on CPI of the pipelined processor.
3. In \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ scheme, we will provide CPU immediately with the instruction that we fetch from lower level of memory (instead of waiting for the entire block to be transferred).
4. Circle appropriately: For non-blocking caches, the bandwidth is \_\_\_\_**better /worse/ no change**\_\_\_\_\_, miss penalty is \_\_**better / worse**\_\_\_\_\_\_\_\_\_, hardware cost \_\_\_\_**low / high / none**\_\_\_\_\_\_\_\_
5. (10 pts) Multiple choice questions. More than one choice may be correct. Credit only if you choose all correct choices.
   1. Which of the following is a true measure of computer performance?

(a) CPI

(b) Clock rate

(c)MIPS

(d) None of the above.

* 1. Which of the following are power reduction techniques

(a) Voltage scaling

(b) Frequency scaling

(c) Increasing clock frequency

(d) None of the above.

* 1. The Write Through scheme can be improved by:

(a) Write Buffer (b) Critical Word first (c) Write-back (d) Dirty blocks

* 1. Which of the following can improve cache performance?

(a) Smaller block size (b) Higher Associativity (c) Multi-level Cache (d) None of the above

* 1. The main advantages of merging write buffer optimization is

(a) reduces hit-time

(b) reduces miss penalty

(c) reduces miss rate

(d) none of the above.

* 1. Which of the following page replacement algorithm is popular in processors?

*(a)* Least Recently Used (b)  *Random* (c) Most Recently Used (d) None of the Above.

* 1. To keep page table size under control, we can use

(a) Larger Virtual Address (b) Multi-level Page Table (c) Large physical address (d) None of the above.

* 1. In an in-order processor, the following hazards are not an issue:

(a) RAW (b) WAW (c) WAR (d) None of the above.

* 1. The drawbacks of loop unrolling is/are:

(a) register shortfall (b) decrease in code size (c) some ld/sd instruction dependences cannot be identified (d) None of the above.

* 1. Assuming an in-order pipeline with data forwarding, the following code has:

(a) data hazard (b) control hazard (c) structural hazard (d) None of the above.

Loop: fld f0, 0(x1)

fadd.d f4, f0, f2

fsd f4, 0(x1)

addi x1, x1, -8

bne x1, x2, Loop

1. **Technology Trends, Performance Measurement**
   1. (4 pts) Describe briefly how the following ISA works?  
      Stack, Accumulator, Register-memory, register-register. (No need of a figure)
   2. Calculate the effective CPI for a RISC-V CPU. Assume the following: All ALU Operations – 1 clock cycle, Loads – 5 cycles, stores – 3 cycles, branches taken – 5 branches not taken – 3, jumps – 3

The benchmark data is:

Gcc: 17% loads, 23% stores, 20% branches, 4% jumps, ALU operations – rest.

1. **Memory Hierarchy**(4 pts) Briefly (in 2-3 sentences each) describe **any four** advanced cache optimization techniques.

(6 pts) A Cache acts as a filter. For example, for every 1000 instructions of a program, an average of 20 memory accesses may exhibit low enough locality that they cannot be services by a 2 MB cache. The 2MB cache is said to have an MPKI (misses per thousand instructions) of 20, and this will be largely true regardless of the smaller caches that precede the 2MB cache. Assume the following cache/latency/MPKI values: 32KB/1/100, 128/2/80, 512KB/4/50, 2MB/8/40, 8MB/1/10. Assume that accessing the off-chip memory system requires 200 cycles on average. For the following cache configuration, calculate the average time spent accessing the cache hierarchy: 32KB L1; 8 MB L2; off-chip memory.

V. Basic Pipelining and RISC-V architecture  
(6 pts) A. Suppose the branch frequencies (as percentages of all instructions) is as follows:  
Conditional Branches: 15% Jumps/Calls: 1% Taken Conditional Branches: 60% Taken  
We are examining a 4-stage pipeline where the branch is resolved at the end of the second cycle for unconditional branches and at the end of the third cycle for the conditional branches. Assuming that only the first pipe stage can always be completed independent of whether the branch is taken and ignoring other pipeline stalls, how much faster would the machine be without any branch hazards.  
(4 pts) B. Answer the following RISC-V Base ISA architecture:  
 a) Name any *four ISA* design principles that RISC-V implements.  
  
 b) No. of integer registers \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ No. of FP registers\_\_\_\_\_\_\_\_\_\_\_\_\_\_  
  
 c) Instruction word length\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_  
  
 d) No. of instruction formats \_\_\_\_\_\_\_\_\_\_\_\_\_

**Formulas**

2. If X is ‘n’ times fast as Y then:
3. Amdahl’s Law:

1. For Multilevel Caches: